Reliable classification of two-class cancer data using evolutionary algorithms.
نویسندگان
چکیده
In the area of bioinformatics, the identification of gene subsets responsible for classifying available disease samples to two or more of its variants is an important task. Such problems have been solved in the past by means of unsupervised learning methods (hierarchical clustering, self-organizing maps, k-mean clustering, etc.) and supervised learning methods (weighted voting approach, k-nearest neighbor method, support vector machine method, etc.). Such problems can also be posed as optimization problems of minimizing gene subset size to achieve reliable and accurate classification. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, here we treat the problem as a multiobjective optimization problem of minimizing the gene subset size and minimizing the number of misclassified samples. Moreover, for a more reliable classification, we consider multiple training sets in evaluating a classifier. Contrary to the past studies, the use of a multiobjective EA (NSGA-II) has enabled us to discover a smaller gene subset size (such as four or five) to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). We have also extended the NSGA-II to obtain multiple non-dominated solutions discovering as much as 352 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to have further confidence in the identification task, we have also introduced a prediction strength threshold for determining a sample's belonging to one class or the other. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multiobjective EA for the gene subset identification task.
منابع مشابه
Efficient Data Mining with Evolutionary Algorithms for Cloud Computing Application
With the rapid development of the internet, the amount of information and data which are produced, are extremely massive. Hence, client will be confused with huge amount of data, and it is difficult to understand which ones are useful. Data mining can overcome this problem. While data mining is using on cloud computing, it is reducing time of processing, energy usage and costs. As the speed of ...
متن کاملADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION
With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...
متن کاملProposing a Novel Cost Sensitive Imbalanced Classification Method based on Hybrid of New Fuzzy Cost Assigning Approaches, Fuzzy Clustering and Evolutionary Algorithms
In this paper, a new hybrid methodology is introduced to design a cost-sensitive fuzzy rule-based classification system. A novel cost metric is proposed based on the combination of three different concepts: Entropy, Gini index and DKM criterion. In order to calculate the effective cost of patterns, a hybrid of fuzzy c-means clustering and particle swarm optimization algorithm is utilized. This ...
متن کاملSoft Computing Methods based on Fuzzy, Evolutionary and Swarm Intelligence for Analysis of Digital Mammography Images for Diagnosis of Breast Tumors
Soft computing models based on intelligent fuzzy systems have the capability of managing uncertainty in the image based practices of disease. Analysis of the breast tumors and their classification is critical for early diagnosis of breast cancer as a common cancer with a high mortality rate between women all around the world. Soft computing models based on fuzzy and evolutionary algorithms play...
متن کاملBreast Cancer Diagnosis from Perspective of Class Imbalance
Introduction: Breast cancer is the second cause of mortality among women. Early detection is the only rescue to reduce the risk of breast cancer mortality. Traditional methods cannot effectively diagnose tumor since they are based on the assumption of well-balanced dataset.. However, a hybrid method can help to alleviate the two-class imbalance problem existing in the ...
متن کاملNegative Selection Based Data Classification with Flexible Boundaries
One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bio Systems
دوره 72 1-2 شماره
صفحات -
تاریخ انتشار 2003